Prueba
## Loading required package: tidyverse
## Warning: package 'tidyverse' was built under R version 3.3.2
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Warning: package 'ggplot2' was built under R version 3.3.2
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
## Loading required package: tsoutliers
## Warning: package 'tsoutliers' was built under R version 3.3.2
## Loading required package: lubridate
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
## Loading required package: hrbrthemes
## Loading required package: CausalImpact
## Loading required package: bsts
## Loading required package: BoomSpikeSlab
## Loading required package: Boom
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.3.2
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: xts
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
## Warning: replacing previous import 'stats::filter' by 'dplyr::filter' when
## loading 'CausalImpact'
## Warning: replacing previous import 'stats::lag' by 'dplyr::lag' when
## loading 'CausalImpact'
## Loading required package: googleAnalyticsR
## Loading required package: ggrepel
## Warning: package 'ggrepel' was built under R version 3.3.2
## Loading required package: ggalt
## Warning: package 'ggalt' was built under R version 3.3.2
## Loading required package: gridExtra
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
## Loading required package: broom
## Loading required package: knitr
## Warning: package 'knitr' was built under R version 3.3.2
## [[1]]
## [1] TRUE
##
## [[2]]
## [1] TRUE
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] TRUE
##
## [[5]]
## [1] TRUE
##
## [[6]]
## [1] TRUE
##
## [[7]]
## [1] TRUE
##
## [[8]]
## [1] TRUE
##
## [[9]]
## [1] TRUE
##
## [[10]]
## [1] TRUE
##
## [[11]]
## [1] TRUE
Objetivo:
ga:users y nuevos usuarios ga:newUsers dimensionados por fecha ga:date para todo el tráfico que viene a través de Google/CPC| date | users | newUsers |
|---|---|---|
| 2016-09-01 | 27 | 19 |
| 2016-09-02 | 29 | 24 |
| 2016-09-03 | 19 | 13 |
| 2016-09-04 | 28 | 23 |
| 2016-09-05 | 37 | 22 |
| 2016-09-06 | 37 | 24 |
Analytics no tiene una métrica returningUsers que nos facilite el número de usuarios recurrentes.
usuarios_desde_google_adwords <- usuarios_desde_google_adwords %>%
mutate(returningUsers = users - newUsers)
knitr::kable(head(usuarios_desde_google_adwords))| date | users | newUsers | returningUsers |
|---|---|---|---|
| 2016-09-01 | 27 | 19 | 8 |
| 2016-09-02 | 29 | 24 | 5 |
| 2016-09-03 | 19 | 13 | 6 |
| 2016-09-04 | 28 | 23 | 5 |
| 2016-09-05 | 37 | 22 | 15 |
| 2016-09-06 | 37 | 24 | 13 |
Extraemos la inversión diaria realizada en pujas por palabras clave en Google Adwords.
| date | adCost |
|---|---|
| 2016-09-01 | 152.06 |
| 2016-09-02 | 161.67 |
| 2016-09-03 | 127.90 |
| 2016-09-04 | 127.92 |
| 2016-09-05 | 153.58 |
| 2016-09-06 | 162.66 |
| date | users | newUsers | returningUsers |
|---|---|---|---|
| 2016-09-01 | 27 | 19 | 8 |
| 2016-09-02 | 29 | 24 | 5 |
| date | adCost |
|---|---|
| 2016-09-01 | 152.06 |
| 2016-09-02 | 161.67 |
usuarios_costes_google_adwords <- left_join(usuarios_desde_google_adwords,
costes_google_adwords, by="date")| date | users | newUsers | returningUsers | adCost |
|---|---|---|---|---|
| 2016-09-01 | 27 | 19 | 8 | 152.06 |
| 2016-09-02 | 29 | 24 | 5 | 161.67 |
## date users newUsers returningUsers
## Min. :2016-09-01 Min. : 1.00 Min. : 0.00 Min. : 1.00
## 1st Qu.:2016-10-17 1st Qu.: 29.25 1st Qu.:20.00 1st Qu.: 8.25
## Median :2016-12-02 Median : 42.50 Median :27.50 Median :13.00
## Mean :2016-12-02 Mean : 50.38 Mean :36.13 Mean :14.25
## 3rd Qu.:2017-01-17 3rd Qu.: 71.00 3rd Qu.:51.75 3rd Qu.:19.00
## Max. :2017-03-05 Max. :132.00 Max. :98.00 Max. :34.00
## adCost
## Min. : 0.00
## 1st Qu.: 73.57
## Median :169.50
## Mean :187.09
## 3rd Qu.:280.25
## Max. :441.57
Todo análisis de serie temporal univariante, comienza con la presentación de un grafico donde se muestra la evolución de la variable a lo largo del tiempo.
¿Y ahora?
plot(density(usuarios_costes_google_adwords$returningUsers),
main="Usuarios recurrentes")boxplot(usuarios_costes_google_adwords$returningUsers,
main="Usuarios recurrentes")plot(usuarios_costes_google_adwords$adCost, usuarios_costes_google_adwords$newUsers,
main = "Adquisición e Inversión publicitaria")Bla bla
¿Cómo influye la inversión en la adquisición?
usuarios_costes_google_adwords %>%
ggplot(aes(x = adCost, y = newUsers)) +
geom_point(color = "orange", size = 3, alpha = 0.8) usuarios_costes_google_adwords %>%
ggplot(aes(x = adCost, y = newUsers)) +
geom_point(color = "orange", size = 3, alpha = 0.8) +
geom_smooth()usuarios_costes_google_adwords %>%
ggplot(aes(x = adCost, y = newUsers)) +
geom_point(color = "orange", size = 3, alpha = 0.8) +
geom_smooth(method="lm")¿En qué variable podemos incidir para provocar un cambio en la otra variable?
y a nuestra variable dependiente o a explicar, newUsersx a la variable independiente o explicativa, adCosty <- usuarios_costes_google_adwords$newUsers
x <- usuarios_costes_google_adwords$adCost
y[1:5]## [1] 19 24 13 23 22
x[1:5]## [1] 152.06 161.67 127.90 127.92 153.58
¿Cuántos usuarios nuevos esperamos adquirir, si no tenemos en cuenta la inversión?
\[\hat y = \bar y\]
mean(y)## [1] 36.12903
\[\hat y = f(x)\]
\[newUsers = f(adCost)\]
\[\hat y = \beta_0 + \beta_1(x)\]
\[ -1 < r < 1 \]
cor(x,y)## [1] 0.918814
¿Es estadÃsticamente significativa la correlación entre \(x\) e \(y\)?
(ct <- cor.test(x, y))##
## Pearson's product-moment correlation
##
## data: x and y
## t = 31.578, df = 184, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8929860 0.9386106
## sample estimates:
## cor
## 0.918814
El p-value es < .5 por lo que podemos rechazar \(H_0\) (No existe correlación en la población) y afirmar que la correlación no es 0. El intervalo de confianza además es muy elevado indicando una fuerte correlación.
¿Cuánto debo invertir para captar un nuevo usuario?
| newUsers | adCost |
|---|---|
| 19 | 152 |
| 24 | 162 |
| 13 | 128 |
Como primer instinto, el CPA o precio medio a pagar por nuevo usuario serÃa \(y = f(x)\), \(cpa = f(newUsers)\) Por lo tanto \(y = \beta x\)
A partir de los datos anteriores $ (152 + 161 + 128) / (19 + 24 + 13) $
(cpa <- sum(x[1:3]) / sum(y[1:3]) )## [1] 7.88625
Coste de adquisición de un usuario \(CPA = 7.89 * 1 (usuario)\)
Nuestro modelo final: \(Usuarios = 1/7.89 * Gasto\)
Este es un modelo predictivo muy simple. Toma un valor de entrada (gasto en euros), aplica una función (1/7.89 * gasto), y devuelve un resultado (Usuarios).
Su nombre técnico: modelo de regresión.
y[1:5] x[1:5]
## [1] 7.782076
new_users.lm <- lm(newUsers ~ adCost, data = usuarios_costes_google_adwords)
summary(new_users.lm)##
## Call:
## lm(formula = newUsers ~ adCost, data = usuarios_costes_google_adwords)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.6322 -5.8271 -0.2376 5.8206 25.7548
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.81243 1.20903 3.153 0.00189 **
## adCost 0.17274 0.00547 31.578 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.779 on 184 degrees of freedom
## Multiple R-squared: 0.8442, Adjusted R-squared: 0.8434
## F-statistic: 997.1 on 1 and 184 DF, p-value: < 2.2e-16
Los coeficientes del modelo
(coeficientes <- coef(new_users.lm))## (Intercept) adCost
## 3.812432 0.172736
¿Término independiente?
\[\hat y = \beta_0 + \beta_1 x\]
predict obtenemos el punto sobre la recta para cada valor de la variable independiente (adCost)# Obtain predicted and residual values
usuarios_costes_google_adwords$predicted <- predict(new_users.lm)
usuarios_costes_google_adwords$residuals<- residuals(new_users.lm)| adCost | newUsers | predicted | residuals |
|---|---|---|---|
| 152.06 | 19 | 30.08 | -11.08 |
| 161.67 | 24 | 31.74 | -7.74 |
| 127.90 | 13 | 25.91 | -12.91 |
\[MSE=\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2\]
\[RMSE=\sqrt{MSE}\]
(mean((usuarios_costes_google_adwords$newUsers - usuarios_costes_google_adwords$predicted)^2))^0.5 ## [1] 8.7321
summary(new_users.lm)$sigma## [1] 8.779429
confint(new_users.lm)## 2.5 % 97.5 %
## (Intercept) 1.4270902 6.1977737
## adCost 0.1619436 0.1835284
posibles_inversiones <- data.frame(adCost=c(50, 320)) 50
320
predict() ahora con nuevos datospredicciones <- as.data.frame(
predict(object = new_users.lm,
newdata = posibles_inversiones ,
interval='prediction')
)
predicciones <- cbind(posibles_inversiones,predicciones)| adCost | fit | lwr | upr |
|---|---|---|---|
| 50 | 12.44923 | -4.981462 | 29.87993 |
| 320 | 59.08795 | 41.661020 | 76.51488 |
a
a
a